{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7a4a40c7-9645-4852-a43a-5b98c9593e2f",
   "metadata": {},
   "source": [
    "# 归并排序"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ad67a1b-b868-4d10-b966-a0d33a6fc47b",
   "metadata": {},
   "source": [
    "归并排序是一种递归算法，每次将一个列表一分为二。如果列表为空或只有一个元素，那么从定义上来说它就是有序的（基本情况）。如果列表不止一个元素，就将列表一分为二，并对两部分都递归调用归并排序。当两部分都有序后，就进行归并这一基本操作。归并是指将两个较小的有序列表归并为一个有序列表的过程。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dbfec7a-d3fc-42ac-a32b-b37c13a11ee0",
   "metadata": {},
   "source": [
    "下图展示了示例列表被拆分后的情况。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df10974a-da7f-47b0-9aeb-cdcf3ae7c70e",
   "metadata": {},
   "source": [
    "![归并排序1](merge_sort1.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccc62a0f-4cd5-4698-a6de-f017d8349b37",
   "metadata": {},
   "source": [
    "下图给出了归并后的有序列表。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b42fa4e3-e08f-4493-85de-659dc6210c50",
   "metadata": {},
   "source": [
    "![归并排序2](merge_sort2.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53b31719-e46b-4128-9f64-760b9ebf006b",
   "metadata": {},
   "source": [
    "在下面代码中，merge_sort 函数以处理基本情况开始。如果列表的长度小于或等于 1，说明它已经是有序列表，因此不需要做额外的处理。如果长度大于 1，则通过 Python 的切片操作得到左半部分和右半部分。要注意，列表所含元素的个数可能不是偶数。这并没有关系，因为左右子列表的长度最多相差 1。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5297bf42-6fdb-4625-bbb6-b9d46e540c5a",
   "metadata": {},
   "outputs": [],
   "source": [
    "def merge_sort(alist):\n",
    "    if len(alist) > 1:\n",
    "        mid = len(alist) // 2\n",
    "        lefthalf = alist[:mid]\n",
    "        righthalf = alist[mid:]\n",
    "\n",
    "        merge_sort(lefthalf)\n",
    "        merge_sort(righthalf)\n",
    "\n",
    "        i = 0\n",
    "        j = 0\n",
    "        k = 0\n",
    "        while i < len(lefthalf) and j < len(righthalf):\n",
    "            if lefthalf[i] < righthalf[j]:\n",
    "                alist[k] = lefthalf[i]\n",
    "                i = i + 1\n",
    "            else:\n",
    "                alist[k] = righthalf[j]\n",
    "                j = j + 1\n",
    "            k = k + 1\n",
    "\n",
    "        while i < len(lefthalf):\n",
    "            alist[k] = lefthalf[i]\n",
    "            i = i + 1\n",
    "            k = k + 1\n",
    "\n",
    "        while j < len(righthalf):\n",
    "            alist[k] = righthalf[j]\n",
    "            j = j + 1\n",
    "            k = k + 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6dbbcd4-2573-4303-a493-4cb6b59ab560",
   "metadata": {},
   "source": [
    "函数首先将原列表分成两个子列表（通过复制），然后在对左右子列表调用 merge_sort 函数后，就假设它们已经排好序了。后面的三个 while 循环负责将两个小的有序列表归并为一个大的有序列表。注意，归并操作每次从有序列表中取出最小值，放回初始列表（alist）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3e360644-fa02-4cd6-88be-1186e4a67aee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "before sorting: [54, 26, 93, 17, 77, 31, 44, 55, 20]\n",
      "after sorting: [17, 20, 26, 31, 44, 54, 55, 77, 93]\n"
     ]
    }
   ],
   "source": [
    "a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]\n",
    "print(f\"before sorting: {a_list}\")\n",
    "merge_sort(a_list)\n",
    "print(f\"after sorting: {a_list}\")\n",
    "assert a_list == sorted(a_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8f307db8-3c39-4f19-91fc-8868ce31efbf",
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import randint, seed\n",
    "seed(13)\n",
    "lst_to_sort = [randint(100, 999) for _ in range(1000)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "132e9c3f-9160-4c5d-9e96-add39a094bbd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "697 μs ± 18.6 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit merge_sort(lst_to_sort)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0cb757a0-f3f2-4670-b209-d8113c1d118a",
   "metadata": {},
   "source": [
    "分析 merge_sort 函数时，要考虑它的两个独立的构成部分。首先，列表被一分为二。当列表的长度为 n 时，能切分 $log_{2} n$ 次。第二个处理过程是归并。列表中的每个元素最终都得到处理，并被放到有序列表中。所以，得到长度为 n 的列表需要进行 n 次操作。由此可知，需要进行 $logn$ 次拆分，每一次需要进行 n 次操作，所以一共是 $n logn$ 次操作。\n",
    "也就是说，归并排序算法的时间复杂度是 $O(n logn)$ 。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec3a2ebb-36f5-43fa-a993-1603c6c9de4c",
   "metadata": {},
   "source": [
    "你应该记得，切片操作的时间复杂度是 $O(k)$ ，其中 k 是切片的大小。为了保证 merge_sort函数的时间复杂度是 $O(n logn)$ ，需要去除切片运算符。在进行递归调用时，传入头和尾的下标即可做到这一点。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "6bec35dd-42b8-4dd0-ac3a-1628528c06a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "def merge_sort(alist):\n",
    "    copybuffer = alist[:]\n",
    "    merge_sort_helper(alist, copybuffer, 0, len(alist) - 1)\n",
    "\n",
    "def merge_sort_helper(alist, copybuffer, low, high):\n",
    "    # alist: list being sorted\n",
    "    # copybuffer: temp space needed during merge\n",
    "    # low, high: bounds of sublist, [low, high]\n",
    "    # middle: midpoint of sublist\n",
    "    if low < high:\n",
    "        middle = (low + high) // 2\n",
    "        merge_sort_helper(alist, copybuffer, low, middle)\n",
    "        merge_sort_helper(alist, copybuffer, middle+1, high)\n",
    "        merge(alist, copybuffer, low, middle, high)\n",
    "\n",
    "def merge(alist, copybuffer, low, middle, high):\n",
    "    # alist: list that is being sorted\n",
    "    # copybuffer: temp space needed during the merge process\n",
    "    # low: beginning of first sorted sublist\n",
    "    # middle: end of first sorted sublist\n",
    "    # middle + 1: beginning of second sorted sublist\n",
    "    # high: end of second sorted sublist\n",
    "    i = low\n",
    "    j = middle+1\n",
    "    k = low\n",
    "    while i <= middle and j <= high:\n",
    "        if alist[i] < alist[j]:\n",
    "            copybuffer[k] = alist[i]\n",
    "            i = i + 1\n",
    "        else:\n",
    "            copybuffer[k] = alist[j]\n",
    "            j = j + 1\n",
    "        k = k + 1\n",
    "\n",
    "    while i <= middle:\n",
    "        copybuffer[k] = alist[i]\n",
    "        i = i + 1\n",
    "        k = k + 1\n",
    "\n",
    "    while j <= high:\n",
    "        copybuffer[k] = alist[j]\n",
    "        j = j + 1\n",
    "        k = k + 1\n",
    "    \n",
    "    for i in range(low, high + 1):\n",
    "        alist[i] = copybuffer[i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "b194eef1-8111-4e9f-9ea8-bc44facacbdc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "before sorting: [54, 26, 93, 17, 77, 31, 44, 55, 20]\n",
      "after sorting: [17, 20, 26, 31, 44, 54, 55, 77, 93]\n"
     ]
    }
   ],
   "source": [
    "a_list = [54, 26, 93, 17, 77, 31, 44, 55, 20]\n",
    "print(f\"before sorting: {a_list}\")\n",
    "merge_sort(a_list)\n",
    "print(f\"after sorting: {a_list}\")\n",
    "assert a_list == sorted(a_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "dffae5f7-0b98-4425-904a-9162c88e99ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import randint, seed\n",
    "seed(13)\n",
    "lst_to_sort = [randint(100, 999) for _ in range(1000)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "8516527a-cb79-4e83-9a20-c02be5d600ca",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "865 μs ± 13 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit merge_sort(lst_to_sort)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f87b033-9ba5-471a-8360-2d644d950b4d",
   "metadata": {},
   "source": [
    "有一点要注意：merge_sort 函数需要额外的空间来存储切片操作得到的两半部分。当列表较大时，使用额外的空间可能会使排序出现问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30f0f64d-11b1-4a4f-b13e-14e745f39dfc",
   "metadata": {},
   "source": [
    "### 参考文档\n",
    "- 《Python数据结构与算法分析（第2版）》：5.3.5 归并排序\n",
    "- 《数据结构Python语言描述》：3.5.2 合并排序"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
